SUPPORT / SAMPLES & SAS NOTES
 

Support

Problem Note 50696: ANORM420 function provides options for correctly supporting the Arabic characters encoded in EBCDIC 420

DetailsHotfixAboutRate It

Arabic is a complex script that is written from right to left. The characters have four basic forms according to the position that they take within the word: initial, middle, final, or isolated. Also, it uses ligatures, which are characters that consist of two or more characters. Most notably, the lam-alef ligature consists of lam (initial or medial form) with alef (final form).

In some encodings, such as EBCDIC420, the lam-alef ligature has a single code point whereas others, such as Windows 1256 or ISO-8859-6, encode the lam and alef characters as separate code points. When SAS transcodes data from EBCDIC420 to one of the other Arabic single-byte encodings, the lam-alef combined character cannot be mapped to a single code point, so a substitution character is placed in the data instead.

A new function, ANORM420, is being introduced in SAS® 9.4 TS1M1. It correctly maps the EBCDIC 420 lam-alef to two separate code points in your data. The function also has options that enable you to add spaces after code points representing the final form of a character and convert Arabic-Indic numerals to a digit.

The following example shows the ANORM420 function usage.

data _null_ ; a = '59CD57BC577745'x ; s1 = anorm420(a) ; /* Turn off addition of space and mapping of Arabic-Indic numbers */ s2 = anorm420(a,"si") ; /* Turn off transcoding */ s3 = anorm420(a,'t') ; put s1= $hex20. / s2= $hex20. / s3=$hex20. ; run;

Here is the resulting output in the SAS log:
s1=C8E5C7E3C7D320A02020

s2=C8E5C7E3C7D3A0202020

s3=59CD57BC577740454040

Click the Hot Fix tab in this note to access the hot fix for this issue.



Operating System and Release Information

Product FamilyProductSystemProduct ReleaseSAS Release
ReportedFixed*ReportedFixed*
SAS SystemBase SASZ649.49.4_M19.4 TS1M09.4 TS1M1
Microsoft® Windows® for x649.49.4_M19.4 TS1M09.4 TS1M1
Microsoft® Windows® for 64-Bit Itanium-based Systems9.49.4 TS1M0
Microsoft Windows 8 Enterprise 32-bit9.49.4_M19.4 TS1M09.4 TS1M1
Microsoft Windows 8 Enterprise x649.49.4_M19.4 TS1M09.4 TS1M1
Microsoft Windows 8 Pro 32-bit9.49.4_M19.4 TS1M09.4 TS1M1
Microsoft Windows 8 Pro x649.49.4_M19.4 TS1M09.4 TS1M1
Microsoft Windows Server 2008 R29.49.4_M19.4 TS1M09.4 TS1M1
Microsoft Windows Server 2008 for x649.49.4_M19.4 TS1M09.4 TS1M1
Microsoft Windows Server 2012 Datacenter9.49.4_M19.4 TS1M09.4 TS1M1
Microsoft Windows Server 2012 Std9.49.4_M19.4 TS1M09.4 TS1M1
Windows 7 Enterprise x649.49.4_M19.4 TS1M09.4 TS1M1
Windows 7 Professional x649.49.4_M19.4 TS1M09.4 TS1M1
64-bit Enabled AIX9.49.4_M19.4 TS1M09.4 TS1M1
64-bit Enabled HP-UX9.49.4_M19.4 TS1M09.4 TS1M1
64-bit Enabled Solaris9.49.4_M19.4 TS1M09.4 TS1M1
HP-UX IPF9.49.4_M19.4 TS1M09.4 TS1M1
Linux for x649.49.4_M19.4 TS1M09.4 TS1M1
Solaris for x649.49.4_M19.4 TS1M09.4 TS1M1
* For software releases that are not yet generally available, the Fixed Release is the software release in which the problem is planned to be fixed.